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In  most  practical  situations,  the  data  one  has  in  hand 
do  not  follow  the  normal  distribution;  if  the  data  were  nor- 
mal,  one  could  use  all  of  the  procedures  in  multivariate 
analysis  which  apply  to  normally  distributed  data.  In  this 
paper,  a routine  method  for  making  the  data  follow  the  nor- 
mal distribution  more  closely  is  discussed;  this  method  is 
to  transform  the  data. 

The  idea  of  using  certain  transformations  before 
analyzing  data  is  well  known.  Two  of  the  most  common  trans- 
formations which  are  usually  introduced  in  an  elementary 
analysis  of  variance  course  are  the  log  transformation  and 
the  reciprocal  transformation.  These  are  special  cases  of 
the  family  of  power  transformations  which  was  introduced  in 
a paper  by  Box  and  Cox  (1964).  Tlie  family  of  transformations 
is  given  by 

= y^  -1  X / 0 
. log  y \=0  . 

Several  methods  are  available  for  estimating  the  value  of  X 
which  will  make  the  data  most  closely  normal.  These  esti- 
mators and  related  tests  of  the  hypothesis  X =Xo  are  dis- 
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cussed  in  the  second  part  of  the  paper. 

Also  included  is  a summary  of  the  analysis  of  two  ex- 
periments which  are  introduced  in  the  paper  by  Box  and  Cox. 

The  power  of  the  tests  is  studied  briefly,  using  the  results 
given  in  an  example  found  in  a paper  by  Atkinson  (1973). 

In  the  third  section  we  discuss  the  influence  curve 
of  an  estimator,  which  is  an  indicator  of  the  influence  of 
an  observation  on  the  estimator.  The  influence  curves  of 
several  robust  estimators  of  location  are  given,  to  intro- 
duce the  concept.  Then  the  idea  of  the  influence  curve  is 
applied  to  the  estimator  of  X . 

Another  purpose  of  the  paper  is  to  further  study  the 
power  of  the  tests  which  are  introduced  in  the  second  section. 
This  study  of  the  power  follows  the  method  which  was  out- 
lined in  the  example  in  the  paper  by  Atkinson.  This  example, 
which  uses  the  biological  data  from  the  Box  and  Cox  paper, 
is  used  as  the  basis  for  a series  of  simulations.  These 
simulations  are  performed  using  data  other  than  normally 
distributed,  to  indicate  the  robustness  of  the  tests,  as  well 
as  the  power  of  the  tests  under  distributions  other  than 
the  normal.  The  results  of  these  simulations  are  given  later 
in  the  paper. 
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CHAPTER  I 1 


i;sTiM‘\Ti;s  OP  TUP  hpst  transporm\tion 


Box  and  Cox  (1964)  worked  with  the  observations 

arranged  in  a nxl  vector  y,  and  defined  the  fam- 
ily of  transformations 

y = C(yi^^h)  = r y:'^  -1  if  i / 0 

^ ^ t ^ 1 ^ 

Llogyj  ifA=o 


where  ^ is  a parameter  to  be  estimated.  They  assumed  that 
for  some  ^ tlie  transformed  observations  y would  be  nor- 
mally distributed  and  would  satisfy  tlie  linear  model 

P.  [y  ] = X P 


witli  constant  error  variances  and  absence  of  interactions. 
Here  \ matrix  of  constants  and  ^ is  a vector  of  para- 


meters . 

The  problem  that  we  wish  to  examine  is  the  estimation 

of  ^ . In  the  paper  by  Box  and  Cox,  tlie  method  of  maximum 

likelihood  was  used  to  find  the  value  of  X for  which  tl^e 

data  would  most  closely  follow  a normal  model  with  constant 

variances  and  no  interactions. 

(X  1 

If  ^ is  tlie  vector  of  transformed  observations, 
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the  maximized  log  likelihood  is 

Lmax(^)  = - a log  + (A-1)  Xlog  Y.  (2) 

'^2  J. 

where  o (^)  - _ X.  estimate  of  the  error 

2 n ^ 

variance  o and  A = I - X(X'X)'^X'. 

Using  this  maximized  log  likelihood,  there  are  two 
ways  by  which  an  estimate  of  A can  be  found.  One  way  is  to 
plot  Lmax  (A)  versus  A , find  the  point  which  maximizes  the 
likelihood  and  choose  that  point  to  be  the  estimate  A of  A . 

Another  way  to  find  A is  to  take  the  derivative  of 
the  likelihood  function,  eciuate  it  to  zero  and  thus  find 
the  estimate  of  A,  This  method  should  be  used  if  more  pre- 
cision is  desired  than  that  given  by  plotting  the  likelihood 
function.  The  derivative  of  Lmax  (A)  with  respect  to  A is 

d Lmax(A)  = n Au*^  ^ + n + Z log  y.  (5) 

dl  T ^ 

2^tX)''Ayt'AT- 

or  multiplying  through  by  Av^^^  , 

= -n  y^^'^'Au^^^  II  Z + (y_^^  ^ ' Ay^(^  ^ ) g log  y. 

to  give  a form  which  is  easier  to  use.  On  equating  this  to 
z_^ero,  we  would  obtain  the  maximum  likelihood  estimate  of  A, 
where  u^^^  is  the  vector  with  components  (^y.^  log  y^). 

To  simplify  the  problem  we  could  consider  the  normal- 
ized observations 

(^4) 

n (A) 

where  d = J(A  ; y_)  = dy^ 


Using  the  transformations  (1),  we  now  get 

' 2 

I,max(A ) = 'in  log  o (A  ; z) 


I 


where  ^ ( -' 

Here  z ^ = ((zi^  ^})  = y-^-1  (5) 

\y^'^ 

where  y is  the  geometric  mean  of  the  observations. 

Box  and  Cox  considered  two  examples  in  their  paper, 
a biological  example  and  a textile  example.  The  biological 
example  was  a 3 x 4 factorial  experiment  with  the  two  fac- 
tors poisons  and  treatments.  They  used  the  transformations 
(1)  and  found  the  value  X=  -0.75  to  be  the  maximum  likeli- 
hood estimate  of  A . Thus  the  familiar  reciprocal  trans- 
formation with  A = -1  was  used,  because  the  results  were 
easier  to  analyze  using  this  transformation  and  because 
customarily  a more  familiar  transformation  should  be  used. 
The  textile  example  was  a 3^  experiment.  Tor  this  example 
they  considered  the  normalized  observations  z^^^  and  found 
the  maximum  likelihood  estimate  of  A to  bo  \=  -0.06. 

Thus  for  this  example  they  used  the  log  transformation  with 
A=  0,  because  the  data  was  easier  to  analyze  and  because  the 
log  transformation  is  more  familiar. 

One  problem  with  the  method  used  by  Box  and  Cox  is 
that  they  assumed  that  for  some  A , the  transformed  ob- 
servations followed  a normal  model  with  constant  error 
variances  and  no  interactions.  It  is  hard  to  believe  that 
for  any  data  vector  to  be  considered,  there  exists  a value 
of  A which  satisfies  these  conditions  exactly. 

A paper  by  Draper  and  Cox  (1969)  addressed  this  prob- 
lem. In  their  paper,  the  conjecture  was  that  even  if  these 


tlirce  conditions  could  not  be  satisfied  as  exactly  for  any 
the  estimate  of  A found  by  the  Box  and  Cox  method  mijjht 
still  be  of  interest.  They  showed  by  example  that  for  this 
estimate  of  A,  the  resulting  distribution  was  close  enough 
to  normal  to  be  useful.  One  problem  they  found  was  that 
the  sample  size  would  have  to  be  quite  large  before  the 
resulting  estimate  of  A was  precise.  This  precision  was 
measured  by  finding  the  variance  of  the  estimate  A 

Another  paper  in  which  the  Box  and  Cox  transformations 
were  considered  was  a paper  by  Schlesselman  (1971).  In  his 
paper,  he  stated  that  the  maximum  likelihood  estimate  was 
not  invariant  under  scaling  of  the  original  observations  from 
y to  (ji)Y  unless  the  ^ matrix  from  the  linear  model  contained 
a column  of  ones  or,  in  other  words,  an  additive  constant 
could  be  removed  from  the  model.  In  most  practical  situ- 
ations, the  model  is  defined  in  such  a way  that  the  ^ 
matrix  does  allo\>;  for  removal  of  an  additive  constant; 
therefore,  this  problem  is  not  important. 

In  a paper  by  Atkinson  (1973),  the  Box  and  Cox  maxi- 
mum likelihood  test  statistic  was  expressed  in  the  form 
Tp  =[  -2  (Lmax(  Ao)  - Lmax(  A)}  (6) 

where  A is  the  estimate  of  A from  the  maximum  likelihood 

method  and  A is  the  exact  value  of  A . Since  t ^ has  an 
0 h 

2 

asymptotic  Xi  distribution,  the  statistic  Tj  has  a standard 
normal  asymptotic  distribution.  Since  this  statistic  has 
an  asymptotic  normal  distribution,  a test  was  desired  which 
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would  have  an  exact  distribution.  Exact  tests  could  thus  be 
made  instead  of  tests  depending  on  an  asymptotic  distri- 
bution. 


Andrews,  in  a 1971  paper,  derived  an  exact  test 
statistic  for  testing  ^ = Aq  which  also  had  the  advantage  of 
being  computationally  simpler  than  the  maximum  likelihood 
statistic.  To  derive  his  test  statistic,  Andrews  started 


with  the  transformations  (1)  and  assumed  that  for  some  A , 
the  vector  of  transformed  observations  ^ could  be  ex- 
pressed in  the  form 


( A) 

Z + e 

where  X and  _3  are  as  defined  previously  and  e is  the  vector 

2 

of  errors,  with  mean  0 and  variance  a 

y 

He  assumed  that  the  values  ^ » which  are  the  trans- 

formed observations  at  the  true  value  of  A , follow  a Taylor 
expansion  about  Ao  given  by 

( ^ o)  = C 8 ) 

Z X_8+v(A-A(,)  + e ^ 

where  the  remainder  terms  in  higher  powers  of  A were  ig- 
nored. The  vector  v was  defined  by 

V = CCv-))  = gy^C  a)  1 


I dX  I X y 

The  vector  v depends  on  y,  so  we  must  somehow  modify  this 
vector  to  construct  a test  statistic.  This  was  accomplished 
by  calculating  the  vector  ^ defined  by 
Z = ( (vj  ) = 3Zi  ^ 

( 10) 

9A  A = Ao,  Z ~ Z 
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where  the  values  ^ are  the  fitted  values  of  v,  given  byX  “ 
(X'X)  ^X'y;  The  test  statistic  was  then  derived  using 

the  method  given  in  Milliken  and  Graybill  (197Q).  The  re- 
sulting statistic  was 
Ta  = - ^ Av 

) — 2 

(as  expressed  in  Atkinson)  where  Sy  is  an  estimate  of  the 
error  variance  Oy^.  This  statistic  has  an  exact  t distri- 
bution; it  would  have  a standard  normal  distribution  if  the 
variance  o were  known. 

One  good  aspect  of  Andrews'  test  is  given  in  his  claim 
that  his  test  is  less  sensitive  to  outliers,  and  by  impli- 
cation to  distributions  with  heavier  tails  than  the  normal. 
Andrews  supports  his  claims  by  analysis  of  Box  and  Cox's  two 
examples  using  his  test  for  both  of  the  Box  and  Cox  examples 
and  also  for  the  biological  example  with  one  additional  out- 
lier added.  His  test  is  affected  much  less  than  the  maximum 
likelihood  test  by  addition  of  the  outlier.  One  purpose  of 
this  work  is  to  construct  a more  formal  study  of  Andrews' 
conjecture  by  making  an  analysis  using  heavie r- tai led  dis- 
tributions than  the  normal  distributions  from  Box  and  Cox's 
examples . 

In  the  paper  by  Atkinson,  a comparison  of  three  tests 
was  given.  The  three  tests  were  the  Box  and  Cox  and  Andrews 
tests  and  another  test  derived  by  Atkinson.  Atkinson  de- 
cided to  consider  another  test  for  two  reasons;  he  wanted 
a test  which  was  easy  to  compute  and  had  higher  power  than 
the  others,  and  also  a test  which  did  not  neglect  the  re- 
mainder as  the  Andrews  test  did. 
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Atkinson  expressed  his  test  in  the  form  which  follows, 

Cx) 

using  the  transformed  observations  £ given  in  (5j: 

Tp  = - 

(12) 

where  w''“-’=3£  / and  is  an  estimate  of  the  vari- 
ance of  the  values  ^( ^^ . The  test  he  derived  was  a 

form  of  the  locally  most  powerful  test. 

In  order  to  compare  the  three  test  statistics  T 
and  Tp , Atkinson  performed  a series  of  simulations  using 
the  model  given  in  Box  and  Cox's  biological  example.  To 
determine  the  pow'er  of  these  tests,  simulations  were  per- 
formed using  different  values  of  and  the  percentage  of 
tests  which  were  significant  in  each  case  was  counted.  lie 
also  gave  a plot  of  these  results,  which  indicated  that 
■\ndrews'  test  Ty\  was  much  less  powerful  than  the  other 
two,  especially  at  large  distances  from  the  true  value  of  X , 
but  that  the  other  two  tests  were  similar  in  power. 

It  is  questionable  how  good  the  results  in  Atkinson's 
paper  are  because  he  only  considers  one  numerical  example. 

He  mentions  this  problem  briefly,  but  since  he  has  only  done 
this  one  example,  the  conclusions  must  be  based  on  the  re- 
sults of  his  example.  Later  in  this  paper,  the  results  of 
further  simulations  which  were  performed  in  this  manner 
using  distributions  other  than  normal  will  be  given. 

All  of  the  tests  which  have  thus  far  been  considered 
were  constructed  on  the  assumption  that  for  someX  , the 
transformed  observations ^ ^ will  follow  a normal  distri- 


bution. In  a later  paper  by  llinkley  ( 19  75),  robust  analysis 
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was  used  to  find  another  way  to  estimate-^  . In  this  paper, 
he  did  not  assume  any  distribution  for  the  transformed  ob- 
servations. He  wished  to  find  a value  of  A for  which  the 
transformed  observations  had  a symmetric  distribution. 

If  there  are  n independent  and  identically  distributed 
random  variables  Y Yj^,  then  the  value  of  a for  which 

the  P and  1-p  quantities  are  symmetric  about  the  median  is 
the  value  that  is  desired.  Since  this  will  be  expressed 
in  terms  of  the  ordered  values  of  Yq,  ...,  Y^  , they  will 
be  denoted  byX]^<  • Yhe  value  of  ^ that  is  desired 

is  the  value  for  which 

X ^-x/  = Xn-r+i  - X ^ (13) 

where r = [np]  und  X is  the  median  of  the  random  variables. 
The  two  solutions  to  this  equation  are > = 0 and  another 
1 solution  which  Hinkley  calls  T.  He  excludes  the  value  A = 0 

unless  X/Xj,=X  and  he  also  rewrites  the  equation 

as 

(Xj./X)  ^ + (Xn.r+l/X)  2.  0^1 

Hinkley  states  that  the  estimate  T of  A has  an  asymptotic 
normal  distribution  and  he  derives  the  asymptotic  variance 
j of  T . 

• In  his  discussion,  Hinkley  also  states  that  problems 

may  arise  when  more  complex  models  are  used.  He  refers  to 
the  Box  and  Cox  biological  example  :uid  states  that  different 
estimates  of  A may  be  found  according  to  which  sets  of  cell 
means  are  examined.  This  is  a large  problem  because  most 
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models  that  we  wish  to  analyze  will  be  similar  to  the  Box 
and  Cox  example.  We  need  an  estimate  of  ),  which  can  also 
be  used  in  these  cases.  Therefore  the  llinkley  estimate  is 
only  useful  in  certain  simplified  cases,  and  not  in  more 
complex  cases. 

We  see  that, of  the  four  estimates  and  test  statistics 
which  we  ha  ^re  considered , the  evidence  given  to  date  in- 
dicates that  the  Atkinson  test  would  probably  be  preferred 
over  the  other  three.  It  is  easier  to  compute  than  the 
maximum  likelihood  statistic,  possibly  more  powerful  than 
Andrews'  test  statistic  and  useful  in  moTC  cases  than  the 
Hinkley  estimate. 

In  this  paper,  the  conjectures  made  by  the  other 
authors  will  be  analyzed  further.  We  have  seen  that  prob- 
lems may  occur  when  distributions  with  heavier  tails  than 
the  normal  distribution  are  considered.  The  results  of 
simulations  performed  using  such  distributions  will  be  given 
and  the  power  of  the  Andrews  and  Atkinson  test  statistics 
will  be  further  considered. 


CHAPTKR  III 


INFLUENCE  CURVES 

The  influence  curve  is  a useful  method  of  representing 
how  the  behavior  of  a single  observation  affects  an  esti- 
mator. It  indicates  how  this  single  observation,  which  may 
be  an  outlier,  changes  the  value  of  the  estimator,  so  it 
is  a measure  of  robustness.  It  is  actually  an  expression 
of  the  first  derivative  of  an  estimator,  evaluated  at  a 
certain  distribution. 

The  influence  curve  will  be  denoted  IC  (x;  T,F), 

where  T is  the  estimator  in  which  we  arc  interested  and  F 

is  the  distribution  at  which  it  is  evaluated.  Let  (y) 

be  the  function  defined  by  6 („)  = r 0 for  v < x 

^ ^ ' 1 for  y > X 

I f we  view  T as  a functional  depending  on  F,  and  denote  it 

T (F) , the  influence  curve  is  defined  in  Hampel  (1974)  as 

IC  ( x;  T,  F)  = limCfd-  OI-+c6x  *T(F)  y,  (1) 

do 

Thus  it  is  evident  that  the  influence  curve  is  the  first 
derivative  of  the  estimator  T at  the  distribution  F. 


If  Fj^  is  the  empirical  distribution  fur  tion  based 
on  a sample  Xj,  •••»  , the  behavior  of  an  estimator 

T is  described  by 
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and  thus 

n HTCr,^)  -T(I-’))  .N(0,  AIC(x;T,l-))"dI-(x))  . (3J 

Therefore  it  is  evident  from  [2)  that  the  influence  curve 
describes  the  "influence"  of  a particular  observation  on 
T (fj. 

A simple  example  of  an  influence  curve,  given  in 
Hampel,  is  the  influence  curve  of  the  arithmetic  mean. 

T = / xdF  (x)  evaluated  at  any  distribution  T which  has  a 
finite  first  moment.  If  the  mean  of  F is  a , then  the  in- 
fluence curve  is 

IC  (x;  T,  F)  -1  ijn [ f 1 - 0 ) ^ 
e 1 o 
= x-p. 

Thus  the  influence  of  a point  x on  the  arithmetic  mean  is 
a simple  linear  function  of  the  point  x,  also  depending  on 
the  mean  F of  the  distribution.  This  influence  curve  is 
unbounded,  which  implies  that  tlie  arithmetic  mean  is 
not  a robust  estimator.  A plot  of  this  influence  curve  is 
given  in  Figure  1.  Plots  are  also  given  of  several  other 
robust  estimators  in  the  following  figures.  These  robust 
estimators  will  fall  into  two  classes:  trimmed  means  and 

M-  estimators.  Most  of  the  results  will  be  taken  either 
from  a book  by  Andrews  and  several  others  (1972)  or  a paper 
by  Carroll  and  Wegman  (1975). 

One  simple  robust  estimator  of  interest  is  the  trimmed 
mean.  The  - trimmed  mean  (for  0 is  found  by 

ordering  the  observations  in  a sample,  deleting  the  nn 
smallest  and  cui  largest  observations,  and  finding  tiie 


arithmetic  mean  of  the  rest. 


The  median  of  the  sample  is 


seen  to  be  the  .50'trimmed  mean.  To  find  the  influence 
curve  of  the  trimmed  mean,  we  need  the  expression  for  the 
a trimmed  mean  of  any  distribution  F,  which  is 
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F'l(t)dt/  (l-2a)  (from  riampel)  . 

The  influence  curve  for  the  a.  trimmed  mean  in  the  special 

case  of  F being  a symmetric  distribution  is  given  by 

IC  (x;T,F)  = I F-l(a)/(l-2a)  forx<F'^a) 

j x/(l-2a)  for  F"l(a)  ^-x  _<F*1  (1-a) 

[ F'l(l-oi)/(l-2a)  for  x <F'l(ot). 

If  the  distribution  F is  asymmetric,  the  expression  for  the 

influence  curve  is  more  complicated.  For  ct  = h,  the  median, 

assuming  that  F has  a density  f which  is  symmetric  about 

zero,  the  influence  curve  is 

IC  (x;  T,  F ) = sign  (x)  (5) 

2f(0) 

Thus,  for  the  trimmed  mean,  the  influence  curve  is  bounded, 
so  the  trimmed  mean  is  a robust  estimator.  A plot  of  the 
influence  curve  of  the  J.0  - trimmed  mean  is  given  in  Figure 

2. 

Another  class  of  robust  estimators  of  interest  is  the 
class  of  M-  estimators,  which  show  very  good  robustness 
properties.  As  given  in  Carroll  and  Wegman,  M - estimators 
are  solutions,  denoted  by  T,  of  an  equation  of  the  form 

FT  (xpT)  " ° 

■j  = l J 

s 

where  y is  an  odd  function  and s is  a scale  estimate.  The 
estimates  can  either  be  found  independently  or  from  an 

I 

equation  of  the  form  | 
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j = 1 s 


(7) 


where 
If  ( X ;k 


is 


an  even  function. 

-k  X <-k 
X -k<x<k  , 
k X >k 


For  Huber  estimates,  choose 

(8) 


use  a specified  function  X in  (7),  and  solve  simultaneously 
for  T and  S, 


The  same  y function  has  also  been  used  (by  Hampel)  a- 
long  with  s = medjx^-50||  /.67S4  to  give  a different  estimator  T 
found  from  solving  equation  (4).  These  estimators  depend 
on  the  value  of  k which  is  chosen.  The  influence  curve  of 
the  general  M~estimator  in  the  case  where  F is  symmetric  is 
given  in  Andrews  et  al , along  with  the  statement  that  the 
influence  curve  is  much  more  complicated  in  the  asymmetric 
case.  The  influence  curve  is  proportional  to  the  function 
y (x;  k).  A plot  of  the  influence  curve  of  tlie  M-estimator 
with  k=  1.5  is  given  in  Figure  3. 

The  influence  curves  of  the  trimmed  means  and  the  Huber 
estimators,  which  one  notes  have  the  same  general  shape, 
both  give  some  influence  to  large  observations.  Hampel 
proposed  an  estimator  T which  gave  zero  influence  to  large 
observations.  He  used  the  median  of  the  absolute  deviations 
from  the  median,  which  he  called  the  median  deviation,  as 
his  scale  estimate  and  chose  y to  be 
y (x:  a,  b , c)  =f  [ X I 0"-  I x I n 


a< 


b?)x| 


Hi 

c 


(9) 


c -ti 


V.  0 


( 


These  estimates,  which  are  called  hampels,  depend  on  the 
value  of  a,  b and  c which  are  chosen.  Since  1'  is  zero  for 
]x  I greater  than  a given  constant,  zero  influence  is 
given  to  an  observation  with  1x1  greater  than  that  constant. 

A plot  of  the  influence  curve  of  the  hampel  estimator  with 
a = 2.5,  b = 4.5  and  c = 9.5  is  given  in  Figure  4. 

These  estimators  of  location  are  useful  for  illus- 
trating the  idea  of  influence  curves,  because  the  results 
are  somewhat  simple.  One  important  point  to  notice  is  that 
the  expressions  for  the  influence  curves  become  much  simpler 
when  the  underlying  distribution  F is  symmetric.  In  the 
first  part  of  the  paper,  power  transforms  which  transformed 
data  to  normality  were  studied.  It  will  now  be  useful  to 
examine  influence  curves  of  the  estimators  of  ^ which  were 
given  there. 

First  we  will  consider  the  influence  curve  of  the 
Box  and  Cox  estimate  of  i in  the  locatior  problem.  The  Box 
and  Cox  estimate  is  the  maximizing  value  of  the  log  likeli- 
hood given  in  equation  (2)  of  the  previous  chapter.  In  an 
unpublished  work  by  Carroll,  the  influence  curve  of  this 
estimate  is  derived.  (This  derivation  is  given  in  the  Ap- 
pendix.) The  results  are  separated  into  two  cases,  \>0  and 
^ . If  A>0  j the  influence  curve  is  of  the  order  y log  y 
as  >•»'•'  and  of  the  order  logy  asy  'o  . If  ^'0  , the 
results  arc  reversed,  giving  order  logy  as  y andy'^ 

logy  as  y ‘ 0 . hooking  at  the  specific  case  ^ = 


- 1 , the 


influence  is  of  the  order  log  for  observ'ations  near 

zero,  so  this  estimate  shoulcfbe  sensitive  to  quite  small 
obse  rv  a t i oixs  ii  -1.  A j'lot  of  t lu'  influciuc  iiiixi'  oi 
the  box  and  Cox  estimate  with  X=  -1  is  given  in  Figure  5. 

Next,  the  Andrews  and  Atkinson  estimates  arc  con- 
sidered. The  influence  curve  does  not  exist  in  general  for 
the  Andrews  estimate, but  some  information  can  be  found  for 
^ = -1.  In  this  case,  more  influence  is  given  to  obser- 
vations near  zero  than  is  given  by  the  Box  and  Cox  estimate. 
The  same  type  of  calculations  are  used  to  find  the  general 
influence  curve  for  the  Atkinson  estimate.  If  we  again 
look  at  the  case  ^ = -1,  the  influence  of  an  observation 
near  zero  is  found  to  be  of  the  order  (log  y ) ^ ; thus  it  is 


also  more  sensitive  to  small  observations  than  the  Box  and 
Cox  maximum  likelihood  estimate. 

The  Hinkley  estimate  is  also  considered.  The  results 
are  not  quite  as  complicated,  so  a general  expression  for 
the  influence  curve  can  be  found.  The  influence  curve  is 
seen  to  depend  on  the  derivative  of  the  underlying  distri- 
bution and  the  value  which  is  chosen.  It  is  a bounded 
function  with  tliree  discontinuities,  so  it  is  not  as 
sensitive  to  large  observations  (if  \>  0)  or  small  obser- 
vations (if  A<  o)  as  the  other  estimates.  It  is  still  not 
desirable  over  the  other  estimates,  though,  because  the 
results  it  gives  are  not  particularly  realistic  for  more 
complicated  models. 
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aL-\PTCR  IV 

RVS'JLTS  OF  SL'fJLATIOMS 

<\s  stated  previously,  one  of  the  main  purposes  of  this  paper 
is  to  further  study  the  po\,’er  of  two  of  the  tests  presented  in  Giapter 
II  hy  performing  a series  of  simulations.  These  simulations  were  per- 
formed asing  the  same  method  as  in  the  paper  by  Atkinson. 

In  -itkinson’s  paper,  he  describes  simulations  which  were  per- 
formed to  study  the  power  of  three  of  the  tests  which  were  described 
in  Chapter  II.  The  three  tests  which  he  used  were  the  Box  and  Cox 
maximim  likeliliood  test  Tj  , .\ndrews'  exact  test  T^  and  his  own  test 
Tj^.  The  simulations  were  based  on  the  data  from  the  biological  example 
in  the  Box  and  Cox  paper.  Here  we  chose  to  perform  simulations  using 
only  the  jVidrcws  test  T^  and  the  Atkinson  test  T^,  because  this  made 
the  computations  easier. 

Atkinson's  simulations  were  performed  using  the  data  from  the 
Box  and  Cox  example  to  generate  normally  distributed  data.  In  order 
to  study  the  robustness  of  the  tests,  we  chose  to  generate  random 
variates  from  three  different  distributions.  The  first  of  these  dis- 
tributions was  the  normal,  which  was  chosen  to  reproduce  the  results 
given  in  Atkinson's  paper.  The  other  two  distributions  chosen  have 
thicker  tails  tlian  the  normal.  The  first  of  these  was  called  the  con- 
taminated normal;  this  was  90  per  cent  N'(0,1)  and  10  per  cent  N(0,0). 
Tlie  oilier  was  the  double  exponential,  which  as  generated  had  variance 


L 
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2.  All  three  of  these  types  of  random  variates  were  multiplied  by  a 

I 

factor  of  .5/>'2.  Tlie  factor  1//2  was  chosen  to  give  the  double  expo-  | 

nential  a variance  of  1 and  the  factor  .5  in  the  numerator  was  chosen  : 

to  more  closely  imitate  the  work  done  by  Atkinson.  The  resulting  var-  | 

iauces  were  .125  for  the  normal,  .225  for  the  contaminated  normal  and  ■ 

' 

.25  for  t!ie  double  exponential. 

The  means  for  all  three  of  the  different  types  of  random  variates 
were  generated  in  the  same  way,  directly  from  the  Box  and  Cox  biological 
data.  The  first  step  was  to  arrange  the  data  into  a 48x1  vector  y_;  then, 
the  data  were  transformed  into  a vector  Since  we  were  interested  in 

testing  the  hypothesis  X=-l,  this  value  was  chosen  in  transforming  the 
observations.  The  means  used  were  the  predicted  means  v^^^=X(X'X) 

In  order  to  study  the  power  of  the  two  tests  T^  and  T^,  the  sim- 
ulations were  repeated  with  different  values  of  X.  To  accomplish  this, 
the  cell  means  were  generated  in  the  method  above  with  X=-l,  transformed 
back  to  the  original  scale  by  taking  t'ae  inverse  of  the  Cox  and  Cox 
transformation  l/(l-\^^^^)  and  then  transformed  again  using  the  new 
value  of  X.  The  values  of  X used  were  x=- 1 . 5 ,- 1 , - . 5 , - .05  arrd  .4.  The 
results  of  these  15  simulations  are  given  on  the  following  pages  in 
Tables  1,2  and  3,  expressed  as  tire  number  of  200  simulations  which 
resulted  in  signifiermt  test  statistics  for  the  tests  T,^  and  T^  for 

I 

all  throe  distributions.  Three  different  plots  are  also  given  in  Tigui'cs 

I 

6,  7 and  8,  one  for  each  different  distribution.  As  stated  in  the  paper 
by  Atkinson,  the  slope  of  these  plots  indicates  tne  power  of  the  Tests. 

The  I'csults  of  these  simulations  agree  witli  .Atkinson's  results 
for  the  normal  case,  since  the  power  of  the  test  T,^  is  greater  t!i;ui 
the  power  of  the  test  T^  for  all  5 values  of  X.  The  results  for  tire 


other  two  distributions  are  partly  consistent  with  the  normal  case, 
because  the  power  of  is  larger  than  the  power  of  at  all  values 
of  X for  both  different  distributions.  Away  from  the  null  hypothesis, 
though,  there  is  a loss  of  efficiency  since  the  power  is  lower  for 
both  Tpi  and  than  in  the  normal  case.  There  is  also  a loss  of  validity 
at  the  null  hypothesis,  because  the  intended  5 per  cent  tests  become 
closer  to  30  per  cent  for  the  Atkinson  test  and  10  per  cent  for  the 
/Vidrews  test. 


TABU-  1 

Power  of  the  two  tests  for  testing  A =- 1 . Data  generated 
from  the  normal  distribution  with  A =k.  Number  out  of  200 
simulations  significant  at  the  5 per  cent  level. 


TABLE  2 

Poivcr  of  the  two  tests  for  testing  A =- 1 . Data  generated 
from  the  contaminated  normal  di stribution  with  A=k.  Number 
out  of  200  simulations  significant  at  the  5 per  cent  level. 
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TABLE  3 

Power  of  the  two  tests  for  testing  A =- 1 . Data  generated 
from  the  double  exponential  distribution  with  A=k.  Number 
out  of  200  simulations  significant  at  the  5 per  cent  level. 
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I FIGURE  6 


Power  of  the  two  statistics  for  testing  X=-l  using 
normally  distributed  data.  Proportion  of  200  simulations 
significant  at  the  5 per  cent  level.  A denotes  T^-  Andrews' 
exact  test  and  D denotes  T^-  Atkinson's  test. 
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FIGURE  7.  Power  of  the  two  test  statistics  for  testing  X=-l 
using  contaminated  normal  data. 


OKJ 


FIGURE  8.  Power  of  the  two  test  statistics  for  testing  x=-l 
using  double  exponential  data. 
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CHAPTER  V 


CONCLUSION 

In  the  first  part  of  the  jxiper,  four  different  estimates  for  the 
optimal  value  of  X were  studied;  the  Box  and  Cox,  Andrews,  Atkinson 
and  llinkley  estimates.  Test  statistics  for  testing  \ = t(,  were  also 
derived  for  the  Box  and  Cox,  Andrews  and  Atkinson  cases.  The  results 
of  a lUimerical  example  in  the  paper  by  Atl;inson  gave  an  indication  of 
the  power  of  the  tliree  tests  T, -the  lk)x  and  Cox  test,  T,-the  Andrews 
test  and  T^-the  Atkinson  test  for  normally  distributed  data.  Atkinson 
concluded  that  his  statistic  Tj^  was  similar  in  power  to  the  statistic 
T|  and  that  both  were  greater  in  power  than  the  statistic  T^. 

A series  of  simulations  was  perlonned  to  exiiand  on  the  results 
given  in  the  [laper  l)y  Atkinson.  The  purpose  ol  these  simulations  was  to 
study  the  robustness,  as  well  as  the  power,  of  the  tests  T^^  and  T^^. 
Miereas  Atkinson  used  only  noniially  distributed  data,  the  simulations 
here  included  noniially  distributed  data,  data  from  a contaminated  normal 
distribution  and  double  exponential  data.  The  results  here  indicated 
that  the  power  of  Tj^  is  greater  than  the  power  of  T^  for  all  three  types 
of  distributions,  lor  the  contaminated  normal  and  tlouble  exponential 
distributions,  though,  the  Atkinson  tost  shows  an  extreme  loss  of 
validity  at  the  null  hypothesis  and  the  Andrews  te.st  shows  a slight  loss 
of  validity.  .Away  from  the  null  hypothesis,  the  Andrews  test  shows  an 
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extreme  loss  of  efficiency  and  the  Atkinson  test  shows  a slight  loss 
of  efficiency. 


7>2 


Since  the  contaminated  normal  and  double  exponential  distributions 
lx)tli  have  heavier  tails  than  tlie  normal,  some  problems  were  expected 
when  these  distributions  were  considered.  Since  the  Andrews  test  is  an 
exact  t-test,  the  loss  of  efficiency  would  be  expected,  because  the 
usual  t-tests  display  this  loss  of  efficiency  away  from  the  null  h)7^o- 
thesis.  The  loss  of  validity  of  the  Atkinson  test  should  also  be  expec- 
ted from  exiunining  the  influence  curve,  because  quite  large  influence 
is  given  to  both  large  and  small  observations  when  X=-l  (which  is  the 
null  hypothesis).  Tlius  the  conclusion  is  that  the  two  tests  are  not 
very  robust  to  heavier  tailed  distributions  than  the  normal,  because 
of  till*  above  mentioned  losses  of  validity  and  efficiency. 


APPLNDIX 


The  Influence  Curve  for  the  Box  and  Cox  Estimate 


The  Box  and  Cox  estimate  is  the  value  which  maximizes  the  log 
likelihood  function 

L max(A)  = (X-1)  i E log  y. 


To  maximize  this  function,  we  take  derivatives  with  respect  to  X and 
let  n -*•  ®.  Evaluating  this  derivative  at  the  "true"  value  Aq,  we  obtain 


where 


and 


0 = lim  L max(A) 

A=A 


0 


, • / (>^)  'a  U)  ^ n (A) 


(X) 


n-»oo 


+ log  y^) 


A=A. 


1 ^^0^  8 (^0^  - TfF  A -1  A 

= E log  y L_{Ey  “ ^y  ° * TCF.AqJ  ^ } 

S^(F,AJ 


(Xq) 


E fV  - - n 

S(F)  ^ ® 


(Xq) 


Now,  to  compute  the  influence  curve,  we  let  Aq  = A(F)  for  given  F and 
define  the  following  equations; 

,(X(F)) 


.j;j(y;T(F),S(F),A(F))  = ^ 
i|;,(y;T(F),S(F),A(F))  = ^ 


STFT 

(X(F)) 


- T(F) 


- TCF))"  . , 
1 

S^(F) 


34 


>i-,(y:T(F-J,S(F),A(I-))  = logy  - - T(F))  • 

i,  dA 

Then  the  functionals  T(F) , S(F)  and  XCF)  are  the  solutions  to  the  system 
of  et]uations 

/4-i(y;T(F),S(F),X(F))dF(y)  = 0 

for  i = 1,2,3. 

To  derive  the  influence  curve,  we  first  need  to  define  the  distribu- 
tion functions 

FJy)  = (l-e)Fg(y)  + c6^(y) 

where 

fO  y < X 

<5^(y)  = 

[l  y > X 

Then  the  above  system  of  equations  becomes 

/<^j(y;T(F^),S(Fp,A(F^))dPJy)  = 0 

which  implies  that 

(i-e)/</^i(y;T(Fp,s(F^),A(Fp)dFgCy) 

+ c/'l^i(yJ(F^),S(F^),A(F^))d6x(y)  = 0 

or 

(l-e)/iPi(y;TCFg)  ,S(F^)  ;A(iy  )dPQ  Cv) 

+ eii^.(x;T(FJ,S(F^),A(F^))  = 0 . 

Next  we  take  derivatives  with  respect  to  c and  evaluate  at  e = 0 to 


- <^.(x;T(Fq),S{Fq),A(Fq)) 


“lil  SW'c’l  ‘'2il  5|S(TV|^__^I  ‘ 


give 


where  the  d2^  and  are  coefficients  which  will  be  found  from 
the  above  equations. 


F 1 _ 1 

^11  ^ ' S(F)  ■ S(F) 

y(^(F))  . T(p)  _ 1 Y(-p)  _ 

^21  ■ ^ ^ _g2^p^  S(F)  ^ S(F)  ^ ® 

31  i^(p)  |;^(p) 


- y)  - T(F) 

^ X(Fl 


a = E -.T(F))  = 0 
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SW) 

.(X(F)) 


a - E -2fy — : -^-  - - 

^22  STF]  ^ S(F)  S(F) 


2(yC>^(F))  . T(p))  J_y(X)| 

^32  " ^ S^(F)  lx(F) 


2_  j,j.y(X(F))  _ 

S^(F) 


X(F),  (X(F)) 

y \og  y - 

X[FT^  ^ 


a = E = (3/aX)Fy^^^^'^^  ^ ^ 
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S^(F) 


S^(F) 


*23  ■ C.3 


i-E(yt'P))  -T(F))(Ay(A(F))) 


S'CF) 


^33  s^F)  ^ ^ 9X^  X(F) 


- y 


(X(F))  y . (y^^^^-1).. 


3X  ^O^P^ 


-I  -IL 

x2(F)  ' 
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Thus  the  above  equations  become 


S(F)  ^31 

■^l' 

° -SW  ^32 

^ ^23  ^33 

. "^3 

To  get  the  solutions  of  the  matrix  equations  and  find  the  influence 
curve,  take  the  inverse  of  the  matrix  to  give 


^ 0 a 1 

S(F)  ‘^31 

-1 

^1 

= - 

2 

° ■ S(F)  ^32 

^2 

U) 

^23  ^33 

.^3  . 

Now  we  see  that  the  influence  curves  are  functions  of  the 

yCA(F)).  TfFi 

4^l(y;T(FQ),S(Fo),A(FQ))  = --s(F)  " 

is  of  order  y^, 

<|J,(y;T(F  ),S(F  ),x(F  ))  = J.(F)):  . 1 

S^(F) 

2X 

is  of  order  y and 

y;3(y;T(FQ),S(Fg),A(F^^))  = log  y - T(F)) 

S (F) 

2X 

is  of  order  y log  y. 

Therefore,  if  A > 0,  the  influence  cui-ve  is  of  tlic  order  y log  y as  y 
and  order  | log  y|  as  y -*■  0.  If  A < 0,  the  results  are  reversed  to  give  order 
log  y as  y and  order  jy^^log  y|  as  y -*•  0.  An  indication  of  these  results 

is  given  in  the  plot  of  the  influence  curve  for  A '=  -1  which  is  given  in 
Figure  5. 
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